Here we have 4 cluster groups. Group 0, which represent as green have lower salary, mostly under 150k, and max years experience in 2-5 years, it is likely Likely junior to mid-level employees with moderate pay. Group 1 with orange, has medium to high salary, wide range from $100k–$500k and with narrow range ~3 years, they are suggests specialized or high-paying roles with short experience — possibly fast-track promotions or high-demand fields. cluster 2 are low salary and experience from 0-4 years, they are clearly entry level employee. cluster 3 has medium salary, mostly under 200k with higher experiences, like 6-13 eyars. They probably are senior professionals with more experience but not the highest salaries.
Code
import pandas as pdfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error, r2_scoreimport plotly.graph_objects as go# Prepare features & targetfeatures = eda[['MIN_YEARS_EXPERIENCE', 'MAX_YEARS_EXPERIENCE']].apply(pd.to_numeric, errors='coerce')features = features.dropna()X = featuresy = eda.loc[X.index, 'SALARY']# Train/test splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=688)# Fit model & predictmodel = LinearRegression()model.fit(X_train, y_train)y_pred = model.predict(X_test)# Metrics (optional, but handy)mse = mean_squared_error(y_test, y_pred)r2 = r2_score(y_test, y_pred)print(f"MSE: {mse:.2f}, R²: {r2:.3f}")# Define min/max for the identity linemin_val = y_test.min()max_val = y_test.max()
MSE: 797238619.53, R²: 0.095
Code
import plotly.graph_objects as go# Safety: make sure min_val and max_val are correctly definedmin_val =min(y_test.min(), y_pred.min())max_val =max(y_test.max(), y_pred.max())# Build figurefig = go.Figure([ go.Scatter( x=y_test, y=y_pred, mode='markers', marker=dict( color='skyblue', opacity=0.6, size=8 ), name='Predicted vs Actual', hovertemplate='Actual: %{x:.2f}<br>Predicted: %{y:.2f}<extra></extra>' ), go.Scatter( x=[min_val, max_val], y=[min_val, max_val], mode='lines', line=dict( color='red', width=2, dash='dash' ), name='Ideal Fit', hoverinfo='skip'# Don't show hover on the line )])# Update layout for better visualfig.update_layout( title='Actual vs Predicted Salary (Multiple Regression)', xaxis_title='Actual Salary ($)', yaxis_title='Predicted Salary ($)', width=700, height=500, template='plotly_white', legend=dict( title='Legend', x=0.02, y=0.98, bordercolor='Gray', borderwidth=1 ), margin=dict(l=60, r=40, t=60, b=60),)# Force equal scaling so that ideal line looks diagonal correctlyfig.update_yaxes(scaleanchor="x", scaleratio=1)# Save HTMLfig.write_html('figures/analytics_plot2.html', include_plotlyjs='cdn', full_html=False)fig.show()
This plot shows the Actual vs. Predicted Salary using a multiple linear regression model. The blue dots represent individual predictions, and the red dashed line is the ideal line where predicted = actual. Since most points lie very close to the red line, it means your model predicts salary very accurately, with minimal error and strong linear fit — likely reflected in a high R² score near 1.0.